Basic Lexicon and Shared Ontology for Multilingual Resources: A SUMO+MILO Hybrid Approach
نویسندگان
چکیده
A common conceptual infrastructure is crucial for multilingual language processing and documentation. Global Wordnet (GWN) was proposed as the common infrastructure for linguistically motivated conceptual representations for all languages. Two critical issues in this line of research are: the scarcity of lexical semantic information (especially from endangered languages), and the lack of a shared conceptual core as the basis of multilingual conceptual representation. In this paper, we elaborate and formalize the proposal to build a shared core common ontology based on the Swadesh list as a solution to tackle with these two critical issues. Comparing Swadesh lists from different languages allowed us to build a small shared ontology that reflects direct human experience, and can serve as the cross-lingual conceptual core. These micro-ontologized lexicons can be used as seeds for developing a fully-grown and more comprehensive documentation of linguistically motivated ontology for each language. In terms of formalization, we propose that SUMO+MILO has the appropriate level of abstractness and coverage for mapping from basic lexicon to formal ontology.
منابع مشابه
Linking FrameNet to the Suggested Upper Merged Ontology
Deductive reasoning with natural language requires combining lexical resources with the world knowledge provided by ontologies. In this paper we describe the connection of FrameNet – a lexicon for English – to the Suggested Upper Merged Ontology (SUMO). We align FrameNet Semantic Types (ST) with SUMO classes, which we express in SUO-KIF, the language of SUMO. Based on this general-domain alignm...
متن کاملA Multilingual Lexico-Semantic Database and Ontology
We discuss the development of a multilingual lexicon linked to a formal ontology. First we describe the Open Multilingual Wordnet, a multilingual wordnet with twenty two languages and a rich structure of semantic relations. It is made by exploiting links from various monolingual wordnets to the English Wordnet. Currently, it contains 118,337 concepts expressed in 1,643,260 senses in 22 language...
متن کاملParadigmatic Morphology and Subjectivity Mark-Up in the RoWordNet Lexical Ontology
Lexical ontologies are fundamental resources for any linguistic application with wide coverage. The reference lexical ontology is the ensemble made of Princeton WordNet, a huge semantic network, and SUMO&MILO ontology, the concepts of which are labelling each synonymic series of Princeton WordNet. This lexical ontology was developed for English language, but currently there are more than 50 sim...
متن کاملWeb 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon
This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in ...
متن کاملMultilingual Conceptual Access to Lexicon based on Shared Orthography: An ontology-driven study of Chinese and Japanese
In this paper we propose a model for conceptual access to multilingual lexicon based on shared orthography. Our proposal relies crucially on two facts: That both Chinese and Japanese conventionally use Chinese orthography in their respective writing systems, and that the Chinese orthography is anchored on a system of radical parts which encodes basic concepts. Each orthographic unit, called han...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007